Deep Learning-Based Telephony Speech Recognition in the Wild

نویسندگان

Kyu J. Han

Seongjun Hahm

Byung-Hak Kim

Jungsuk Kim

Ian R. Lane

چکیده

In this paper, we explore the effectiveness of a variety of Deep Learning-based acoustic models for conversational telephony speech, specifically TDNN, bLSTM and CNN-bLSTM models. We evaluated these models on both research testsets, such as Switchboard and CallHome, as well as recordings from a realworld call-center application. Our best single system, consisting of a single CNN-bLSTM acoustic model, obtained a WER of 5.7% on the Switchboard testset, and in combination with other models a WER of 5.3% was obtained. On the CallHome testset a WER of 10.1% was achieved with model combination. On the test data collected from real-world call-centers, even with model adaptation using application specific data, the WER was significantly higher at 15.0%. We performed an error analysis on the real-world data and highlight the areas where speech recognition still has challenges.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Deep Learning-Based Telephony Speech Recognition in the Wild

نویسندگان

چکیده

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

عنوان ژورنال:

اشتراک گذاری